Causal discovery from databases with discrete and continuous variables1
نویسندگان
چکیده
Causal discovery is widely used for analysis of experimental data focusing on the exploratory analysis and suggesting probable causal dependencies. There is a variety of causal discovery algorithms in the literature. Some of these algorithms rely on the assumption that there are no latent variables in the model; others do not provide a scoring metric to easily compare the reliability of two candidate models. Bayesian Constraint-based Causal Discovery (BCCD) [1] is a state-of-the-art-algorithm for causal discovery that tries to combine the strength of the best algorithms in the field. BCCD is able to detect latent variables in the model and determines the reliability of the edges between variables that makes it very easy to compare alternative models. The idea of BCCD is to estimate the reliability of causal relations by scoring Directed Acyclic Graphs (DAGs) for a smaller subset of variables using a Bayesian score and then to combine these statements to infer a final causal model. The Bayesian score has a closed form solution for discrete variables that makes the scoring of causal relations fast and efficient. The BCCD algorithm is currently limited to discrete or Gaussian variables as there is no closed form solution for the Bayesian score for a mixture of discrete and continuous variables. To extend BCCD, we need a new scoring method to estimate the reliability of causal relations. There are several scoring methods in the literature for mixtures of discrete and continuous variables. Most of these methods either rely on strict assumptions about the structure of the network that do not apply in practice, such as forbidding structures in the network with a continuous variable as a parent having a discrete variable as child, or are time consuming and/or memory inefficient.
منابع مشابه
Causal Discovery from Databases with Discrete and Continuous Variables
Bayesian Constraint-based Causal Discovery (BCCD) is a state-of-the-art method for robust causal discovery in the presence of latent variables. It combines probabilistic estimation of Bayesian networks over subsets of variables with a causal logic to infer causal statements. Currently BCCD is limited to discrete or Gaussian variables. Most of the real-world data, however, contain a mixture of d...
متن کاملCombining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data
Estimating causal models from observational data is a crucial task in data analysis. For continuousvalued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same...
متن کاملThe Discovery of Generalized Causal Models with Mixed Variables Using MML Criterion
One major difficulty frustrating the application of linear causal models is that they are not easily adapted to cope with discrete data. This is unfortunate since most real problems involve both continuous and discrete variables. In this paper, we consider a class of graphical models which allow both continuous and discrete variables, and propose the parameter estimation method and a structure ...
متن کاملContinuous Discrete Variable Optimization of Structures Using Approximation Methods
Optimum design of structures is achieved while the design variables are continuous and discrete. To reduce the computational work involved in the optimization process, all the functions that are expensive to evaluate, are approximated. To approximate these functions, a semi quadratic function is employed. Only the diagonal terms of the Hessian matrix are used and these elements are estimated fr...
متن کاملCausal Induction 1 Running head : Continuous Causal Inference Causal Induction from Continuous Event Streams
Three experiments investigated the impact of delay on human causal learning. We present a new paradigm based on the presentation of continuous event streams, and use it to test two hypotheses drawn from associative learning theories of causal inference. Unlike free-operant procedures traditionally used to study temporal aspects of causal learning (Shanks, Pearson, & Dickinson, 1989; Shanks & Di...
متن کامل